Automatic lips reading for audio-visual speech processing and recognition
نویسنده
چکیده
This contribution is about the method for automatic lips reading from the video picture. The results of this automatic method are used for the next audio-visual speech processing and recognition. The simple image processing method for finding of the human face in the video picture is presented here. The lips are found from the marked human face in the region of interest, where the lips are, with the help of the mathematical gradient method. This gradient method is based on the image histogram. The histogram is computed from the colour value of the region of interest. The first results for visual speech recognition of isolated words are presented in conclusion. The method described here was used for face and lips detection to help speech recognition.
منابع مشابه
Designing and implementing a system for Automatic recognition of Persian letters by Lip-reading using image processing methods
For many years, speech has been the most natural and efficient means of information exchange for human beings. With the advancement of technology and the prevalence of computer usage, the design and production of speech recognition systems have been considered by researchers. Among this, lip-reading techniques encountered with many challenges for speech recognition, that one of the challenges b...
متن کامل3d Lip-tracking for Audio-visual Speech Recognition in Real Applications
In this paper, we present a solution to the problem of tracking 3D information about the shape of lips from 2D picture of a speaker. We focus on lip-tracking of audio-visual speech recordings from the Czech in-vehicle audio-visual speech corpus (CIVAVC). The corpus consists of 4 h 40 min records of audiovisual speech of driver recorded in a car during driving in an usual traffic. In real condit...
متن کاملLip-reading from parametric lip contours for audio- visual speech recognition
This paper describes the incorporation of a visual lip tracking and lip-reading algorithm that utilizes the affine-invariant Fourier descriptors from parametric lip contours to improve the audio-visual speech recognition systems. The audio-visual speech recognition system presented here uses parallel hidden Markov models (HMMs), where a joint decision, using an optimal decision rule, is made af...
متن کامل‘vVISWa’ – A Multilingual Multi-Pose Audio Visual Database for Robust Human Computer Interaction
Automatic Speech Recognition (ASR) by machine is an attractive research topic in signal processing domain and has attracted many researchers to contribute in this area of signal processing and pattern recognition. In recent year, there have been many advances in automatic speech reading system with the inclusion of audio and visual speech features to recognize words under noisy conditions. The ...
متن کاملResource aware design of a deep convolutional-recurrent neural network for speech recognition through audio-visual sensor fusion
Today’s Automatic Speech Recognition systems only rely on acoustic signals and often don’t perform well under noisy conditions. Performing multi-modal speech recognition processing acoustic speech signals and lip-reading video simultaneously significantly enhances the performance of such systems, especially in noisy environments. This work presents the design of such an audio-visual system for ...
متن کامل